Pilot Experiment

Descriptives

Task description

Each experiment contains 6 trials. Each trial contains 6 training slides (non-ambiguous utterance) and 1 test slide (ambiguous utterance), as shown below.

Training slide Test slide

Left: training slide, Right: test slide

The items appearing in a trial belong to a set of 3 categories of items (for e.g. musical instruments, fruits, vehicles) that is unique across 6 trials. We run the experiment with distributions (6-0-0), (4-2-0), and (2-2-2), with each number corresponding to how many times a category has items appear in the trial.

Sample size

We remove responses that get less than 5/6 correct in training slides. If there are duplicated IPs, we only take the first response.

n
19

Finding

There is a graded response in distribution (4-2-0) corresponding to the frequency of appearance of each category. This fits with our hypothesis that people use common ground to identify the reference of an ambiguous utterance.

However, this result could also be driven by a recency effect where people simply choose according to the category of the last seen item.

We see that people are choosing the item in the test slide in the same category as the last training item above chance, suggesting a possbily strong recency effect.

Recency Effect Experiment

Descriptives

Task description

In order to look at the recency effect more closely, we manipulate the experiment to look at distribution (5-1-0) where the last item preceding the test slide is never in the most frequent category. We also use 2 new phrasing for the test slide, “Here’s the last one” and “Look at that” Distribution (2-2-2), with all category appearances randomized, is used as the control.

Training slide Test slide

Training slide Test slide

Left: training slide, Right: test slide

Sample size

We remove responses that get less than 5/6 correct in training slides. If there are duplicated IPs, we only take the first response.

n
81

Finding

Distribution (5, 1, 0)

We see that the recency effect is reduced when “Here’s the last one” and “Look at that” are used. We observe a higher increase in proportion of people choosing the item from the least frequent category in the condition “Here’s the last one”, perhaps because it is possible to interpret “the last one” as the item from the category that has not appeared. “Look at that” looks closest to the graded response that corresponds to people using common ground to identify reference.

Moving forward

We stay with “Look at that.” as the phrasing for the test slide.

We run 2 experiments with adults, with the same setup:

  1. (6-0-0) vs (4-2-0) vs (2-2-2)
  2. (3-0-0) vs (2-1-0) vs (1-1-1)

to test the effect of relative and absolute strength.

We run the 3rd experiment with other distributions that are interesting, such as (1-0-0) and (3-2-1).

We start pilotting on kids, with the distributions (6-0-0) vs (4-2-0) vs (2-2-2).